Learning continuous image representations is recently gaining popularity for image super-resolution (SR) because of its ability to reconstruct high-resolution images with arbitrary scales from low-resolution inputs. Existing methods mostly ensemble nearby features to predict the new pixel at any queried coordinate in the SR image. Such a local ensemble suffers from some limitations: i) it has no learnable parameters and it neglects the similarity of the visual features; ii) it has a limited receptive field and cannot ensemble relevant features in a large field which are important in an image; iii) it inherently has a gap with real camera imaging since it only depends on the coordinate. To address these issues, this paper proposes a continuous implicit attention-in-attention network, called CiaoSR. We explicitly design an implicit attention network to learn the ensemble weights for the nearby local features. Furthermore, we embed a scale-aware attention in this implicit attention network to exploit additional non-local information. Extensive experiments on benchmark datasets demonstrate CiaoSR significantly outperforms the existing single image super resolution (SISR) methods with the same backbone. In addition, the proposed method also achieves the state-of-the-art performance on the arbitrary-scale SR task. The effectiveness of the method is also demonstrated on the real-world SR setting. More importantly, CiaoSR can be flexibly integrated into any backbone to improve the SR performance.
translated by 谷歌翻译
现有的视频denoising方法通常假设嘈杂的视频通过添加高斯噪声从干净的视频中降低。但是,经过这种降解假设训练的深层模型将不可避免地导致由于退化不匹配而导致的真实视频的性能差。尽管一些研究试图在摄像机捕获的嘈杂和无噪声视频对上训练深层模型,但此类模型只能对特定的相机很好地工作,并且对其他视频的推广不佳。在本文中,我们建议提高此限制,并专注于一般真实视频的问题,目的是在看不见的现实世界视频上概括。我们首先调查视频噪音的共同行为来解决这个问题,并观察两个重要特征:1)缩减有助于降低空间空间中的噪声水平; 2)来自相邻框架的信息有助于消除时间上的当前框架的噪声空间。在这两个观察结果的推动下,我们通过充分利用上述两个特征提出了多尺度的复发架构。其次,我们通过随机调整不同的噪声类型来训练Denoising模型来提出合成真实的噪声降解模型。借助合成和丰富的降解空间,我们的退化模型可以帮助弥合训练数据和现实世界数据之间的分布差距。广泛的实验表明,与现有方法相比,我们所提出的方法实现了最先进的性能和更好的概括能力,而在合成高斯denoising和实用的真实视频denoisising方面都具有现有方法。
translated by 谷歌翻译
基于参考的图像超分辨率(REFSR)旨在利用辅助参考(REF)图像为超溶解的低分辨率(LR)图像。最近,RefSR引起了极大的关注,因为它提供了超越单图SR的替代方法。但是,解决REFSR问题有两个关键的挑战:(i)当它们显着不同时,很难匹配LR和Ref图像之间的对应关系; (ii)如何将相关纹理从参考图像转移以补偿LR图像的细节非常具有挑战性。为了解决RefSR的这些问题,本文提出了一个可变形的注意变压器,即DATSR,具有多个尺度,每个尺度由纹理特征编码器(TFE)模块组成,基于参考的可变形注意(RDA)模块和残差功能聚合(RFA)模块。具体而言,TFE首先提取图像转换(例如,亮度)不敏感的LR和REF图像,RDA可以利用多个相关纹理来补偿更多的LR功能信息,而RFA最终汇总了LR功能和相关纹理,以获得更愉快的宜人的质地结果。广泛的实验表明,我们的DATSR在定量和质量上实现了基准数据集上的最新性能。
translated by 谷歌翻译
在本文中,我们研究了实用的时空视频超分辨率(STVSR)问题,该问题旨在从低型低分辨率的低分辨率模糊视频中生成高富含高分辨率的夏普视频。当使用低填充和低分辨率摄像头记录快速动态事件时,通常会发生这种问题,而被捕获的视频将遭受三个典型问题:i)运动模糊发生是由于曝光时间内的对象/摄像机运动而发生的; ii)当事件时间频率超过时间采样的奈奎斯特极限时,运动异叠是不可避免的; iii)由于空间采样率低,因此丢失了高频细节。这些问题可以通过三个单独的子任务的级联来缓解,包括视频脱张,框架插值和超分辨率,但是,这些问题将无法捕获视频序列之间的空间和时间相关性。为了解决这个问题,我们通过利用基于模型的方法和基于学习的方法来提出一个可解释的STVSR框架。具体而言,我们将STVSR作为联合视频脱张,框架插值和超分辨率问题,并以另一种方式将其作为两个子问题解决。对于第一个子问题,我们得出了可解释的分析解决方案,并将其用作傅立叶数据变换层。然后,我们为第二个子问题提出了一个反复的视频增强层,以进一步恢复高频细节。广泛的实验证明了我们方法在定量指标和视觉质量方面的优势。
translated by 谷歌翻译
深度神经网络通过学习从低分辨率(LR)图像到高分辨率(HR)图像的映射,在图像超分辨率(SR)任务中表现出了显着的性能。但是,SR问题通常是一个不适的问题,现有方法将受到一些局限性。首先,由于可能存在许多不同的HR图像,因此SR的可能映射空间可能非常大,可以将其删除到相同的LR图像中。结果,很难直接从如此大的空间中学习有希望的SR映射。其次,通常不可避免地要开发具有极高计算成本的非常大型模型来产生有希望的SR性能。实际上,可以使用模型压缩技术通过降低模型冗余来获得紧凑的模型。然而,由于非常大的SR映射空间,现有模型压缩方法很难准确识别冗余组件。为了减轻第一个挑战,我们提出了一项双重回归学习计划,以减少可能的SR映射空间。具体而言,除了从LR到HR图像的映射外,我们还学习了一个附加的双回归映射,以估算下采样内核和重建LR图像。通过这种方式,双映射是减少可能映射空间的约束。为了应对第二项挑战,我们提出了一种轻巧的双回归压缩方法,以基于通道修剪来降低图层级别和通道级别的模型冗余。具体而言,我们首先开发了一种通道编号搜索方法,该方法将双重回归损耗最小化以确定每一层的冗余。鉴于搜索的通道编号,我们进一步利用双重回归方式来评估通道的重要性并修剪冗余。广泛的实验显示了我们方法在获得准确有效的SR模型方面的有效性。
translated by 谷歌翻译
视频修复旨在从多个低质量框架中恢复多个高质量的帧。现有的视频修复方法通常属于两种极端情况,即它们并行恢复所有帧,或者以复发方式恢复视频框架,这将导致不同的优点和缺点。通常,前者具有时间信息融合的优势。但是,它遭受了较大的模型尺寸和密集的内存消耗;后者的模型大小相对较小,因为它在跨帧中共享参数。但是,它缺乏远程依赖建模能力和并行性。在本文中,我们试图通过提出经常性视频恢复变压器(即RVRT)来整合两种情况的优势。 RVRT在全球经常性的框架内并行处理本地相邻框架,该框架可以在模型大小,有效性和效率之间实现良好的权衡。具体而言,RVRT将视频分为多个剪辑,并使用先前推断的剪辑功能来估计后续剪辑功能。在每个剪辑中,通过隐式特征聚合共同更新不同的帧功能。在不同的剪辑中,引导的变形注意力是为剪辑对齐对齐的,该剪辑对齐可预测整个推断的夹子中的多个相关位置,并通过注意机制汇总其特征。关于视频超分辨率,DeBlurring和DeNoising的广泛实验表明,所提出的RVRT在具有平衡模型大小,测试内存和运行时的基准数据集上实现了最先进的性能。
translated by 谷歌翻译
视频修复(例如,视频超分辨率)旨在从低品质框架中恢复高质量的帧。与单图像恢复不同,视频修复通常需要从多个相邻但通常未对准视频帧的时间信息。现有的深度方法通常通过利用滑动窗口策略或经常性体系结构来解决此问题,该策略要么受逐帧恢复的限制,要么缺乏远程建模能力。在本文中,我们提出了一个带有平行框架预测和远程时间依赖性建模能力的视频恢复变压器(VRT)。更具体地说,VRT由多个量表组成,每个量表由两种模块组成:时间相互注意(TMSA)和平行翘曲。 TMSA将视频分为小剪辑,将相互关注用于关节运动估计,特征对齐和特征融合,而自我注意力则用于特征提取。为了启用交叉交互,视频序列对其他每一层都发生了变化。此外,通过并行功能翘曲,并行翘曲用于进一步从相邻帧中融合信息。有关五项任务的实验结果,包括视频超分辨率,视频脱张,视频denoising,视频框架插值和时空视频超级分辨率,证明VRT优于大幅度的最先进方法($ \ textbf) {最高2.16db} $)在十四个基准数据集上。
translated by 谷歌翻译
Image restoration is a long-standing low-level vision problem that aims to restore high-quality images from lowquality images (e.g., downscaled, noisy and compressed images). While state-of-the-art image restoration methods are based on convolutional neural networks, few attempts have been made with Transformers which show impressive performance on high-level vision tasks. In this paper, we propose a strong baseline model SwinIR for image restoration based on the Swin Transformer. SwinIR consists of three parts: shallow feature extraction, deep feature extraction and high-quality image reconstruction. In particular, the deep feature extraction module is composed of several residual Swin Transformer blocks (RSTB), each of which has several Swin Transformer layers together with a residual connection. We conduct experiments on three representative tasks: image super-resolution (including classical, lightweight and real-world image super-resolution), image denoising (including grayscale and color image denoising) and JPEG compression artifact reduction. Experimental results demonstrate that SwinIR outperforms state-of-the-art methods on different tasks by up to 0.14∼0.45dB, while the total number of parameters can be reduced by up to 67%.
translated by 谷歌翻译
Deep neural networks (DNNs) are known to be vulnerable to adversarial attacks that would trigger misclassification of DNNs but may be imperceptible to human perception. Adversarial defense has been important ways to improve the robustness of DNNs. Existing attack methods often construct adversarial examples relying on some metrics like the $\ell_p$ distance to perturb samples. However, these metrics can be insufficient to conduct adversarial attacks due to their limited perturbations. In this paper, we propose a new internal Wasserstein distance (IWD) to capture the semantic similarity of two samples, and thus it helps to obtain larger perturbations than currently used metrics such as the $\ell_p$ distance We then apply the internal Wasserstein distance to perform adversarial attack and defense. In particular, we develop a novel attack method relying on IWD to calculate the similarities between an image and its adversarial examples. In this way, we can generate diverse and semantically similar adversarial examples that are more difficult to defend by existing defense methods. Moreover, we devise a new defense method relying on IWD to learn robust models against unseen adversarial examples. We provide both thorough theoretical and empirical evidence to support our methods.
translated by 谷歌翻译
Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks.
translated by 谷歌翻译